Identification of Author Personality Traits using Stylistic Features: Notebook for PAN at CLEF 2015
نویسندگان
چکیده
Author profiling is the task of determining the age, gender or type of the author's personality by studying their sociolect aspect, that is, how the language is shared by people. This paper presents the COMSATS Institute of Information Technology, Lahore entry for the PAN 2015 competition on Author Profiling task. Our proposed system is based on stylometry features. We implemented 29 different stylistic features, many of which are language independent. Since the training data was available in multiple languages, one of our main objectives was to explore which language independent features are most effective. The problem of author profiling was casted as a supervised document classification task. Results showed that features (Percentage of Question Sentences, Average Sentence Length, Percentage of Punctuations, Percentage of Comma and Percentage of Full stops) were most effective multilingual features.
منابع مشابه
Author Profiling of Twitter Users: Notebook for PAN at CLEF 2015
In this paper, we focused on profiling authors on age, gender, and five personality traits. The corpus consists of anonymized twitter posts categorized into 4 different languages. Our proposed approach was to use a combination of tfidf, function words, stylistic features, and text bigrams, and used an SVM for each task.
متن کاملUniNE at CLEF 2015 Author Profiling: Notebook for PAN at CLEF 2015
This paper describes and evaluates an effective author profiling model called SPATIUM-L1. The suggested strategy can be adapted without any problem to different languages (such as Dutch, English, Italian, and Spanish) in Twitter tweets. As features, we suggest using the 200 most frequent terms of the query text (isolated words and punctuation symbols). Applying a simple distance measure and loo...
متن کاملXRCE Personal Language Analytics Engine for Multilingual Author Profiling: Notebook for PAN at CLEF 2015
This technical notebook describes the methodology used – and results achieved – for the PAN 2015 Author Profiling Challenge by the team from Xerox Research Centre Europe (XRCE). This year, personality traits are introduced alongside age and gender in a corpus of tweets in four languages – English, Spanish, Italian and Dutch. We describe a largely language agnostic methodology for classification...
متن کاملSyntactic N-grams as Features for the Author Profiling Task: Notebook for PAN at CLEF 2015
This paper describes our approach to tackle the Author Profiling task at PAN 2015. Our method relies on syntactic features, such as syntactic based n-grams of various types in order to predict the age, gender and personality traits that has the author of a given text. In this paper, we describe the used features, the employed classification algorithm, and other general ideas concerning the expe...
متن کاملAutomatic Author Profiling Based on Linguistic and Stylistic Features Notebook for PAN at CLEF 2013
The rapid expansion of blog and electronic data in Web 2.0 is abounding and thus it is becoming important to identify the author‟s profile also. The problems of automatic identification of author‟s gender and age based on linguistic and stylistic pattern have been a subject of increasingly research interest in the recent years. The research methodologies are also helpful for several other appli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015